nash equilibria
Countering Feedback Delays in Multi-Agent Learning
We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call λ-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.
Deep Learning Games
We investigate a reduction of supervised learning to game playing that reveals new connections and learning methods. For convex one-layer problems, we demonstrate an equivalence between global minimizers of the training problem and Nash equilibria in a simple game. We then show how the game can be extended to general acyclic neural networks with differentiable convex gates, establishing a bijection between the Nash equilibria and critical (or KKT) points of the deep learning problem. Based on these connections we investigate alternative learning methods, and find that regret matching can achieve competitive training performance while producing sparser models than current deep learning approaches.
Learning in Games with Lossy Feedback
We consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games. We propose a simple variant of the classical online gradient descent algorithm, called reweighted online gradient descent (ROGD) and show that in variationally stable games, if each agent adopts ROGD, then almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents. We then extend the framework to deal with unknown feedback loss probabilities by using an estimator (constructed from past data) in its replacement. Finally, we further extend the framework to accomodate both asynchronous loss and stochastic rewards and establish that multi-agent ROGD learning still converges to the set of Nash equilibria in such settings. Together, these results contribute to the broad lanscape of multi-agent online learning by significantly relaxing the feedback information that is required to achieve desirable outcomes.
- North America > Canada > Alberta (0.14)
- North America > United States > Texas (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Mathematics of Computing (0.64)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
- Europe > Switzerland > Vaud > Lausanne (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark (0.04)
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- North America > United States > Texas (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)